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Summary. Phylogenetic trees constructed using predicted amino acid sequences 
of putative proteins of coronavirus HKU1 (CoV-HKUl) revealed that CoV-HKU 1 
formed a distinct branch among group 2 coronaviruses. Of the 14 trees from p65 
to nsplO, nine showed that CoV-HKU 1 was clustered with murine hepatitis virus. 
From nsp 11, the topologies of the trees changed dramatically. For the eight trees 
from nsp 11 to N, seven showed that the CoV-HKU 1 branch was the first branch. 
The codon usage patterns of CoV-HKU 1 differed significantly from those in other 
group 2 coronaviruses. Split decomposition analysis revealed that recombination 
events had occurred between CoV-HKU 1 and other coronaviruses. 

Introduction 

It has been estimated that coronaviruses [human coronaviruses 229E (HCoV- 
229E) and OC43 (HCoV-OC43)] cause about 5-30% of respiratory tract infec¬ 
tions. In late 2002 and 2003, Severe Acute Respiratory Syndrome (SARS), caused 
by SARS coronavirus (SARS-CoV), has resulted in more than 750 deaths [12, 15, 
16, 17, 22-24], In early 2004, a novel coronavirus associated with respiratory 
tract infections, human coronavirus NL63 (HCoV-NL63), was discovered [3, 20]. 
As a result of a unique mechanism of viral replication, coronaviruses have a high 
frequency of recombination [9, 10, 13, 14]. 
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Coronaviruses were divided into three groups, with HCoV-229E and HCoV- 
NL63 being group 1 coronaviruses and HCoV-OC43 a group 2 coronavirus respec¬ 
tively [11]. For SARS-CoV, it was initially proposed that SARS-CoV constituted a 
distinct group of coronavirus [15,17]. However, after more extensive phylogenetic 
analysis, it was discovered that SARS-CoV probably represents a distant relative 
of group 2 coronaviruses [2, 18]. Further in silico analysis also predicted that 
SARS-CoV could be a product of recombination between mammalian and avian 
coronaviruses [19]. 

Recently, we have described the discovery of a novel coronavirus associated 
with pneumonia, coronavirus HKU1 (CoV-HKUl) [21]. Based on analysis of the 
putative chymotrypsin-like protease (3CF pro ), RNA-dependent RNA polymerase 
(Pol), helicase, hemagglutinin-esterase (HE), spike (S), envelope (E), membrane 
(M) and nucleocapsid (N), CoV-HKUl is a member of group 2 coronaviruses. 
However, the origin of CoV-HKU 1 is still unknown. In this study, we performed 
a detailed phylogenetic analysis of CoV-HKUl. Possible recombination events 
were predicted and the origin of CoV-HKU 1 discussed. 


Materials and methods 

The predicted amino acid (a.a.) sequences of p65, conserved portions of nspl [papain-like 
protease 1 (PLl pro ), Appr-l-p processing enzyme family (Alpp), papain-like protease 2 
(PL2 pro ), hydrophobic domain 1 (HD1), and hydrophobic domain 2 (HD2)], nsp2-7, nsp9-13, 
HE, S, E, M and N were extracted from the CoV-HKU 1 genome sequence (GenBank accession 
no. AY597011) [21]. The corresponding a.a. sequences of murine hepatitis virus (MHV), 
HCoV-OC43, bovine coronavirus (BCoV), porcine hemagglutinating encephalomyelitis virus 
(PHEV), rat sialodacryoadenitis coronavirus (SDAV) and puffinosis virus (PV) were extracted 
from complete genome sequences of MHV (GenBank accession no. AF201929), HCoV-OC43 
(GenBank accession no. AY585229) and BCoV (GenBank accession no. NC_003045), and 
sequences of PHEV, SDAV and PV available in GenBank. The a.a. sequence of HE of MHV 
was extracted from MHV strain JHM (GenBank accession no. BAA00661) because the HE 
gene in MHV (GenBank accession no. AF201929) stopped prematurely after the 97th a.a. 
Phylogenetic tree construction was performed using neighbour joining method with ClustalX 
1.83. The corresponding a.a. sequences of HCoV-229E were used as outgroups, except for 
p65 and HE because these were not available in the genome of HCoV-229E. For p65 and 
HE, the corresponding a.a. sequences in SARS-CoV and influenza C virus were used as the 
outgroups respectively. Phylogenetic trees were not constructed for p28 and the predicted 
hypothetical protein of ORF4 and ORF8 in CoV-HKUl because no a.a. sequences that can 
be used as the appropriate outgroups can be found. 

The amino-terminal 800 a.a. residues of the S proteins in various group 1 coronaviruses 
[porcine transmissible gastroenteritis virus (TGEV), HCoV-NL63 and HCoV-229E], various 
group 2 coronaviruses (PHEV, SDAV, MHV, HCoV-OC43 and BCoV), infectious bronchitis 
virus (IB V) (a group 3 coronavirus), SARS-CoV and CoV-HKU 1 were aligned using ClustalX 
1.83. The presence and positions of conserved cysteine residues in the various peptides were 
compared. 

Correspondence analysis was used to compare the codon usage pattern variation in the 
different genes among group 2 coronaviruses in a multidimensional space [5]. All available 
sequences of ORF lab, HE, S, M and N of MHV, HCoV-OC43, BCoV, PHEV, SDAV, PV and 
SARS-CoV were downloaded from the GenBank (Table 1). Analysis of codon usage in these 
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sequences and the corresponding ones in CoV-HKUl was performed using CodonW 
(http://www.molbiol.ox.ac.uk/cu/), with each gene represented as a 59 dimensional vector, 
representing the 59 possible sense codons. AUG, the only codon for methionine, UGG, the 
only codon for tryptophan, and the three stop codons were excluded. The ORF for E was 
excluded because the length of the gene was too short. 

To delineate the importance of recombination on the evolution of CoV-HKU 1, split decom¬ 
position analysis was performed. Deduced a.a. sequences of group 1, 2 and 3 coronaviruses 
and SARS-CoV available in GenBank, that were homologous to 3CL pro , Pol, helicase, HE, 
S, ORF4, E, M and N in CoV-HKUl [21], were retrieved. Split decomposition analysis was 
performed with SplitsTree version 3.2 [7] using Hamming correction and is presented with 
the same edge length. 


Results 

The genome organizations of CoV-HKU 1 and other group 2 coronaviruses were 
shown in Fig. 1 a. Phylogenetic trees using predicted a.a. sequences of putative pro¬ 
teins and polypeptides of CoV-HKU 1 and other group 2 coronaviruses were con¬ 
structed (Fig. lb). The putative proteins and polypeptides included p65, conserved 
portions of nspl (PLl pro , Alpp, PL2 pro , HD1 and HD2), nsp2-7, 
nsp9-13, HE, S, E, M and N. All trees revealed that CoV-HKUl formed a distinct 
branch among group 2 coronaviruses. Interestingly, of the 14 trees of p65 to nsplO, 
nine (64%) (p65, HD1, HD2, nsp3, nsp4, nsp6, nsp7, nsp9 and nsplO) showed 
that CoV-HKUl was clustered with MHV (Fig. lb). However, for the eight trees 
of nspl 1 to N, seven (88%) showed that the CoV-HKUl branch appeared as the 
first branch among group 2 coronaviruses (Fig. lb). 

Comparison of the cysteine residues in the N-terminal 800 a.a. residues of S 
in CoV-HKUl and those in the different groups of coronaviruses revealed that 
almost all the conserved cysteine residues in group 2 coronaviruses were present 
in CoV-HKUl (Fig. 2a), supporting that CoV-HKUl is a member of group 2 
coronaviruses. 

The number of ORF lab, HE, S, M and N sequences in the group 2 coron¬ 
aviruses used for correspondence analysis is shown in Table 1. The results of the 


◄- 

Fig. 1 . Genome organization and phylogenetic analysis of CoV-HKUl. a Genome 
organization of CoV-HKUl (GenBank accession no. AY597011), MHV (GenBank accession 
no. AF201929), HCoV-OC43 (GenBank accession no. AY585229) and BCoV (GenBank 
accession no. NC_003045). The homologous regions used for phylogenetic analysis were 
shaded, b Phylogenetic analysis of p65, conserved portions of nspl (PLl pro , Alpp, PL2 pro , 
HD1 and HD2), nsp2-7, nsp9-13, HE, S, E, M and N in group 2 coronaviruses. The trees were 
constructed by neighbour joining method using Jukes-Cantor correction and bootstrap values 
calculated from 1000 trees. 578, 204, 107, 212, 421, 496, 303, 287, 89, 197, 110, 137, 928, 
595, 521, 374, 299, 424, 1287, 84, 226 and 445 a.a. positions in p65, PLl pro , Alpp, PL2 pro , 
HD1, HD2, nsp2, nsp3, nsp4, nsp5, nsp6, nsp7, nsp9, nsplO, nspll, nspl2, nspl3, HE, S, 
E, M and N respectively were included in the analysis. The scale bar indicates the estimated 
number of substitutions per 5 or 10 a.a. as indicated. The corresponding a.a. sequences of 
HCoV-229E were used as the outgroups, except for p65 and HE, for which the corresponding 
a.a. sequences in SARS-CoV and influenza C virus were used as the outgroups respectively 


2304 


P. C. Y. Woo et al. 


Group 3 


Group 2 


GEV 


Group 1 I HCoV-NL63 
LhCoV-229E 

SARS-CoV 


PHEV 

SDAV 


MHV 

HCoV-OC43 

BCoV 

CoV-HKUl 




Fig. 2. Analysis of cysteine positions in the N-terminal 800 a.a. residues of S and codon 
usage patterns of CoV-HKUl. a Schematic representation of cysteine positions (♦) in the 
N-terminal domain of S in CoV-HKUl in comparison with those in other coronaviruses. 
Conserved cysteine residues of S in different coronaviruses are joined by solid lines. The bar 
indicates the a.a. residue positions on S. b A scattered plot of the scores for the codon usage 
patterns of ORF lab, HE, S, M and N in MHV, HCoV-OC43, BCoV, PHEV, SDAV, PV and 

CoV-HKU 1 on the first and second axis 


correspondence analysis with respect to axis 1 and 2 are shown in Fig. 2b. Axis 1 
and 2 explained 36.6% and 19.3% of the variations in codon usage respectively. 
For ORF lab, HE, S and M, the scores on axis 1 in group 2 coronaviruses other 
than CoV-HKU 1 were clustered between —0.16 and 0.28 and those in CoV-HKU 1 
were clustered between —0.40 and —0.24 (Fig. 2b). For N, the scores on axis 1 
in group 2 coronaviruses other than CoV-HKU 1 were clustered between 0.48 and 
0.57 and that in CoV-HKUl was at 0.11 (Fig. 2b). These indicated that the codon 
usage patterns in the genes in CoV-HKUl differed significantly from those in 
other group 2 coronaviruses. 
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Table 1 . Number of ORF lab, hemagglutinin-esterase (HE), spike (S), membrane (M) and 
nucleocapsid (N) sequences in the various groups of coronaviruses used for correspondence 

analysis 
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a HCoV-OC43, human coronavirus OC43; MHV, murine hepatitis virus; BCoV, bovine 
coronavirus; SDAV, rat sialodacryoadenitis coronavirus; PHEV, porcine hemagglutinating 
encephalomyelitis virus; PV, puffinosis virus; SARS-CoV, SARS coronavirus; CoV-HKUl, 
human coronavirus HKU 1 

Split decomposition analysis revealed that recombination events had occurred 
between CoV-HKUl and other group 2 coronaviruses in 3CL pro , Pol, helicase, 
HE, S, ORF4, E and M (Fig. 3). No evidence of recombination was shown between 
the N of CoV-HKUl and those of other group 2 coronaviruses. 

Discussion 

CoV-HKUl is a distinct member of group 2 coronaviruses. It was confirmed by 
both phylogenetic analysis of 22 protein coding regions (Fig. lb) and analysis of 
the conserved cysteine residues in the amino-terminal of the S proteins (Fig. 2a) 
that CoV-HKU 1 is a group 2 coronavirus. Furthermore, phylogenetic analysis of 
the 22 protein coding regions revealed that there were 10-54% a.a. differences 
between a particular protein coding region in CoV-HKU 1 and the corresponding 
region in the most closely related sequence, indicating that CoV-HKU 1 is distinct 
from the other group 2 coronaviruses. This fact was further supported by results 
of correspondence analysis of codon usage (Fig. 2b). 

Recombination events were common among CoV-HKU 1 and other group 2 
coronaviruses. Coronaviruses have high frequency of homologous RNA recom¬ 
bination, which has been observed in both tissue culture [10, 14] and experi¬ 
mentally infected animals [8]. In split tree analysis, recombination events would 
result in reticulations instead of simple branching structures. As shown in Fig. 3, 
recombination was particularly frequent in CoV-HKU 1 and MHV as compared to 
other group 2 coronaviruses such as BCoV and HCoV-OC43. The particular high 
recombination frequency in MHV [1] is in line with evidence of a lot of inter¬ 
strain recombination, as shown by the high number of reticulations in various 
ORFs of the different MHV strains (Fig. 3). Complete genome sequencing of 
additional CoV-HKU 1 and further split tree analysis would shed light on whether 
CoV-HKUl behaves more like MHV or BCoV and HCoV-OC43. 
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CoV-HKUl may have originated from a major recombination event and nu¬ 
merous minor recombination events among group 2 coronaviruses. In feline coro¬ 
navirus, the site of recombination has been pinpointed to a region of about 50 
nucleotides in the M gene by multiple alignment [6]. As for recombination between 
different strains of MHV, in vitro studies have shown both variable sites and 
rates of recombination, with the S gene have a frequency three fold that of 
the polymerase gene [4, 14]. In CoV-HKUl, nine of the 14 phylogenetic trees 
constructed using deduced a.a. sequences of p65 to nsplO showed that CoV-HKU 1 
was clustered with MHV (Fig. lb). Interestingly, the topologies of the phylogenetic 
trees changed dramatically from nsp 11. For the eight trees from nsp 11 to N, seven 
revealed that the CoV-HKU 1 branch appeared as the first branch among the group 
2 coronaviruses (Fig. lb) (P < 0.01 by chi-square test). A logical explanation was 
that a major recombination event has taken place in the region between nsplO and 
nsp 11 when CoV-HKU 1 first appeared. However, this recombination event was 
not evident in multiple alignment performed at the junction between nsplO and 
nsp 11 (data not shown). This is because although CoV-HKUl is more clustered 
with MHV from p65 to nsplO, the difference in phylogenetic distances between 
CoV-HKUl and MHV and those between CoV-HKUl and BCoV/HCoV-OC43 
is not marked (Fig. lb), in contrast to what was observed in feline coronavirus 
[6]. Furthermore, bootscanning analysis in the whole genome did not reveal any 
putative recombination break point (data not shown). We speculate that this could 
be due to numerous minor recombination events between p65 and nsplO, such 
as between p65 and nspl-PLl pro , between nspl-PL2 pro and nspl-HDl, between 
nsp4 and nsp5, and between nsp5 and nsp6. This has resulted in CoV-HKU 1 being 
clustered with MHV in only nine of the 14 phylogenetic trees constructed using 
deduced a.a. from p65 to nsplO, but four of the 14 trees with the CoV-HKUl 
branch being the first branch among the group 2 coronaviruses. 
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